Feature Analysis

The standard features: LLR (low level discriptors)
File path: \emotiondetection\features_labels_lld

load data

class Data. please see common.py

get the training data


In [48]:
import numpy as np
import os
from sklearn.manifold import TSNE

from common import Data

lld=Data('lld')
lld.load_training_data()
print 'training feature shape: ', lld.feature.shape
print 'training label shape: ', lld.label.shape

#lld.load_test_data()
#print 'test feature shape: ',lld.feature_test.shape
#print 'test label shape: ',lld.label_test.shape


lld
training feature shape:  (9959L, 384L)
training label shape:  (9959L, 2L)

a. histogram

plot the histgram of one feature, to see what distribution the feature is.


In [42]:
import matplotlib.pyplot as plt
%matplotlib inline  

feature_table=[1,10,100,300]
for ind,fea in enumerate(feature_table):
    f= lld.feature[:,fea]
    
    plt.subplot(2,2,ind+1)
    plt.hist(f)
    #plt.title("Histogram of feature "+str(ind))
plt.axis('tight')


Different features have different ditributions.
Some are subject to Gussain distribution.

b. t-SNE

use TSNE to see the linear separability of the data.


In [43]:
model=TSNE(n_components=2,random_state=0) # reduct the dimention to 2 for visualization
np.set_printoptions(suppress=True)
Y=model.fit_transform(lld.feature,lld.label) # the reducted data

In [47]:
plt.scatter(Y[:, 0], Y[:, 1],c=lld.label[:,0],cmap=plt.cm.Spectral)
plt.title('training data')
plt.axis('tight')
 
print Y.shape


(9959L, 2L)

the linear separability is so terrible : (

c. analyse what classification methods is suit for out data theoretically

  • Training data:
    9959 examples and 384 features.
    5 classes

  • most used classification methods SVM: good for 2 calsses. feature dimenson big ->computing time large. $\surd$